So far, we’ve looked at these plots using a one-sided (directional) null. Let’s look at a two-sided (non-directional) null.
Role of the p-value/critical-value
Fisher: A continuous measure indicating the strength of evidence against H₀.
Neyman-Pearson: A decision making tool that compares results to threshold.
Error Rates
Fisher: Does not focus on binary decision making
Neyman-Pearson: Controlss Type I and Type II errors through α and β
Philosophy
Fisher: Inductive inference; aims to measure evidence without making definitive decisions.
Neyman-Pearson: Frequentist perspective; focuses on long-run error rates and decision rules.
i’m sorry.
In NHST, we claim \(H_0\) is true, and try to provide evidence against it using Reductio Ad Absurdum. We calculate a p-value, and use it to decide whether \(H_0\) should be rejected or not.
If \(H_0\) were true, the data we observed is absurd, therefore we will act as if \(H_0\) is false
Modern NHST mixes the Fisherian and Neyman-Pearson frameworks where p-values are used as a binary decision making tool, but are also often treated as continuous measures of evidence at the same time. This leads to misuse and misconceptions.
\[ p \lt 0.05 \]
Review: \(p = p(\text{test statistic} \mid H_0)\)
p-values are not the probability the null is true
p-values are not the probability the effect will replicate
non-significant p-values do not mean that the null is true
❓If I use an \(\alpha = 0.05\), and I run 20 tests, what is the probability that I get at least one significant p-value
❓If I use an \(\alpha = 0.05\), and I run 20 tests, and the null is true, what is the probability that I get at least one significant p-value out of these 20 tests.
\[ fwer = 1- \underbrace{0.95^{20}}_\text{p(all non sig)} = 64\% \]
To correct for this, we often use things like a Bonferroni or Sidak correction:
Bonferroni: \(p_{thresh} = \frac{\alpha}{m}\); where \(m\) is the number of tests
Sidak: \(p_{thresh} = 1 - (1-\alpha)^{\frac{1}{m}}\); where \(m\) is the number of tests
Exercise for the listener: What is the FWER using these new thresholds? How does it change as \(m \to \infty\) ?
Remember, non-significant p-values do not mean that the null is true
Claim: there are no black swans.